Systems and methods for deriving leading indicators of economic activity using predictive analytics

ABSTRACT

Predictive analytics techniques are used to produce leading indicators of economic activity based on factors determined from a range of available data sources, such as public and/or private transportation data. A fee-based subscription system may be provided for the sharing of leading indicators to users. A consistent, semantic metadata structure is described as well as a hypothesis generating and testing system capable of generating predictive analytics models in a non-supervised or partially supervised mode.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a Continuation of U.S. patent application Ser. No.16/797,640, entitled “SYSTEMS AND METHODS FOR DERIVING LEADINGINDICATORS OF FUTURE MANUFACTURING PRODUCTION, AND CONSUMPTION OF GOODSAND SERVICES” filed Feb. 21, 2022, which will issue as U.S. Pat. No.11,361,202 on Jun. 14, 2022, the entire disclosure of which is herebyincorporated by reference.

TECHNICAL FIELD

The present invention relates, generally, to systems and methods formodeling complex economic and commercial systems and, more particularly,to the application of predictive analytics and machine learning, deeplearning, active learning and other artificial intelligence techniquesto the prediction of future economic performance.

BACKGROUND

Historically, the prediction of future economic activity has been basedon lagging indicators, such as historical trends in retail sales,industrial production, average prime interest rates, unemploymentlevels, wage growth, interest rates, inflation rates, monthly housingstarts, broadband internet penetration, crop planting and yield reports,and similar economic metrics. These indices are important tools to awide range of individuals and financial institutions, particularlysecurities traders, fund managers, government regulators, and the like.

Currently known methods for predicting economic activity and associatedmetrics are unsatisfactory in a number of respects, in part because theyrely on intuition, heuristics, publicly available documents (e.g.,aggregated quarterly reports for a sector), and/or “technical analysis”of numerical trend data. Trades, positions, and other investmentdecisions are often based on real time inferences drawn from theseindices. High asset value fund managers know that beating a forecast byeven a fraction of a percentage point—or receiving pertinent informationjust seconds earlier than a manager of a competing fund—can instantlyresult in billions of dollars gained (or lost). Thus, fund managersrelentlessly seek real-time “hints” to incrementally sharpen theirvision of future economic performance.

More generally, the complexity of unstable market forces, combined withthe not-always-rational behavior of its participants, suggests that nosingle factor can reliably predict future performance, even within adiscrete market sector. Rather, multiple factors often coalesce within afire hose of disparate data streams and datasets, rendering them beyondthe grasp of even the most astute financial scholars. Indeed, oneestimate suggests that, by the year 2025, global data production willexceed 460 exabytes (10¹⁸ bytes) per day. It is simply not possible forhuman beings to perform traditional hypothesis testing on data of thesemagnitudes.

Thus, there is a long-felt need for improved methods of identifying,collecting, synthesizing, and processing groups of data to correctlypredict the value and/or state of variables which characterize futureeconomic activity or behavior, as well as for improved ways ofdiscovering, vetting, testing, and quantifying the correlations amongthe factors upon which predictive models are built.

SUMMARY OF THE INVENTION

Various embodiments of the present invention relate to systems andmethods for, inter alfa: i) identifying factors that tend to influencefuture levels of manufacturing, production, and consumption of goods andservices; ii) identifying and characterizing correlations among thefactors; iii) identifying and harnessing sources of data to facilitatequantitative analysis of the factors; iv) applying predictive analyticstechniques to the factors, the correlations among the factors, the datastreams and datasets associated with the factors, the quantitativeanalysis of the data and the correlations among datasets; v) usinginformation derived from the application of predictive analytics toyield potential leading indicators of economic activity based ondatasets selected from a range of available data sources; vi) applyingsuch predictive analytics techniques to produce leading indicators basedon fee-based or non-subscription data sources such as publicallyavailable and/or private data feeds (e.g., transportation data); vii)producing such leading indicators based on data sources associated withthe direct observation of activity at one or more maritime facilities(e.g., maritime shipping lanes); viii) producing such leading indicatorsbased on ground-based (terrestrial), aerial, or space based images ofconsumer or commercial vehicles; ix) providing a fee-based subscriptionsystem for selectively sharing the leading indicators with stakeholders; x) providing a consistent, semantic metadata structure foronboarding and managing data employed for such uses; xi) generating andtesting hypotheses utilizing machine learning, deep learning, activelearning and/or other artificial intelligence techniques; and xii)applying supervised and/or non-supervised learning techniques in thecontext of a hypothesis generating and testing system to recursivelyrefine datasets and known correlations among them, identify new datasources, identify or refine new correlations, develop or refine existinganalytic and predictive models.

BRIEF DESCRIPTION OF THE DRAWING FIGURES

The present invention will hereinafter be described in conjunction withthe appended drawing figures, wherein like numerals denote likeelements, and:

FIG. 1 is a conceptual block diagram illustrating predictive analyticssystem in accordance with various embodiments;

FIG. 2 is a conceptual block diagram and process flow for a predictiveanalytics system in accordance with various embodiments;

FIG. 3 is a top down view of a port environment useful in describedvarious embodiments;

FIGS. 4A, 4B and 5 illustrate a marine vessel in accordance with variousembodiments;

FIGS. 6A and 6B illustrate aerial views of a parking lot showingvariations in the number of parked consumer vehicles;

FIG. 7 is a flowchart illustrating a method in accordance with variousembodiments; and

FIG. 8 is a conceptual block diagram of a hypothesis generation andtesting system (HGTS) in accordance with various embodiments.

DETAILED DESCRIPTION OF PREFERRED EXEMPLARY EMBODIMENTS

The present disclosure relates to improved techniques for identifyingand quantifying leading indicators of future levels of manufacturing,production, and consumption of goods and services, and to machinelearning systems and methods for predicting economic activity andproducing actionable leading indicators of that activity. Furthermore,the present subject matter presents a generalized hypothesis generatingand testing system that may be employed for the purposes of generatingsuch leading indicators. In addition, the present subject matterdescribes methods by which users may subscribe to proprietary servicesto gain access to the indicators. In that regard, the following detaileddescription is merely exemplary in nature and is not intended to limitthe inventions or the application and uses of the inventions describedherein. Furthermore, there is no intention to be bound by any theorypresented in the preceding background or the following detaileddescription. In the interest of brevity, conventional techniques andcomponents related to economic indicators, securities trading, marketactivity, machine learning models, and data analytics techniques may notbe described in detail herein.

Referring first to the conceptual block diagram of FIG. 1, variousembodiments may be implemented in the context of a predictive analyticssystem (“PAS”) 100. As illustrated, PAS 100 includes a data preparationmodule 130 configured to receive data from a wide range of data sources101 (e.g., 111-123), some of which might be public (110) as described inmore detail below, and others that that might be private (120). Datapreparation module 130 is generally configured to clean and otherwiseprocess the data received from data sources 101 and subsequently storethe processed data in a data warehouse 140. Two analytics modules (150,160) and a comparison module 170 interact with data warehouse 140 inorder to produce, using the factors and predictor variables derived fromdata sources 101, an output 180 which, as described in further detailbelow, corresponds to a target variable corresponding to a microeconomicand/or macroeconomic leading indicator.

Data Sources

As a preliminary matter, it will be understood that the data receivedfrom data sources 101 may take a variety of forms and may exhibit orembody one or more of a wide range of attributes. Thus, for example, thedata will often take the form of discrete or continuous numerical data(e.g., integers, floating point numbers, etc.) or categorical data(e.g., unordered or ordered categories). The data may be provided, forexample, as a series of time-varying scalar values, as matrices ortables, as an arbitrary tensor of a given dimensionality, or any otherform now known or later developed. Data sources 101 and the datasetsderived therefrom are thus the predictor variables used for producingthe associated machine learning model(s). In that regard, the phrase“machine learning model” may be used to refer to a set of individualmachine learning models, each having a different type and purpose. Forexample, a convolutional neural network (CNN) model may be used toperform object detection and classification in a scene, while a separateCNN may be used to determine the speed and/or acceleration of an objectin a scene.

In addition to pre-processed summary and descriptive statistics, datasources 101 may represent the output of one or more sensors configuredto determine a binary or non-binary value or state of the environmentand provide a raw or processed stream of information relating to thatenvironment (or objects within the environment). Non-limiting examplesof such sensors include optical cameras (for providing discrete orsequential data frames representing images or video feeds), infraredcameras, LIDAR sensors (producing point-clouds of objects in theenvironment), SONAR and RADAR sensors, microphones, acoustic mappingsensors, natural language processing systems, vibration orweather-related sensors (temperature, humidity, seismic activity, etc.),proximity sensors, gas detectors, pressure sensors, soil monitoringsensors, and the like.

Data sources 101 may be categorized as either public (110) or private(120). As used herein, public or “open” data sources are those sourcesof data that are generally available free of charge (or for a nominalfee) to individuals and/or corporate entities, typically in electronicform via the Internet. Such public data sources may be provided, in manyinstances, by governmental entities, but might also be provided by (orotherwise available from) private companies. In contrast, “private” datasources are those that are fee based, behind a paywall, or otherwiserequire a form of permission for access. With respect to both public andprivate data sources, the data itself may be anonymized, pseudonymized,or provided in accordance with a differential privacy (DP) protocol.

Non-limiting examples of public data sources include: (1) social mediafeeds (e.g., Facebook, Twitter, YouTube, Instagram, Tumblr, Pinterest,Facebook Graph API, LinkedIn, Social Mention, WeChat, Baidu, GoogleTrends, etc.); (2) mapping and satellite data (Google Maps, GoogleEarth, MapQuest, Bing Maps, Apple Maps, etc.); (3) municipal street,highway or port of entry traffic, waterway, shipping dock or seaportvessel traffic, airport commercial or passenger aircraft traffic,railway commercial or passenger traffic, pedestrian traffic camerafeeds; (4) open datasets relating to global issues, such as the WorldHealth Organization (WHO) Open data repository, Google Public DataExplorer, the European Union Open Data Portal, the U.S. Census Bureau;(5) financially focused open data sources such as the UN ComtradeDatabase, the International Monetary Fund (IMF) datasets, the U.S.Bureau of Economic Analysis, the U.S. Securities and ExchangeCommission, the National Bureau of Economic Research World Bank OpenData; (6) crime data, such as the FBI Uniform Crime Reporting Program,the National Archive of Criminal Justice Data (NACJD); (7) academicdatasets, such as Google Scholar, Pew Research Center, National Centerfor Education Statistics; (8) environmental data, such as Climate DataOnline (CDO), National Center for Environmental Health (NCEH), the IEAAtlas of Energy; (9) business directory data, such as Glassdoor, Yelp,LinkedIn, Open Corporates, and the like.

Private data sources may include, for example, Bloomberg, Capital IQ,and Thompson Reuters financial databases, as well as othersubscription-based access to audio, video, or image feeds that may beprovided by various entities. Additional data feeds might includeonboard vehicle camera data sources provided by commercial or passengervehicle manufacturers or third party or public transportation operators,security camera feeds provided by commercial or residential real estateproperty owners or their leased business owners. Location and movementtracking information may be derived from global positioning systems,mobile phone or radio tower tracking, retail order or financial banking,credit card, debit card, mobile phone, social media-based,cryptocurrency, purchase or currency exchange transaction information.Private data sources might also include purchase, shipping and receivingtracking information for parcels, goods, services; crop planting,harvesting and yield information; livestock breeding, fishing, herdingor other animal production or slaughter information; raw material orrefined goods production, storage, sale or trading information such ascrude and refined oil or natural gas as well as minerals, lumber, cementor other physical materials; service call records; and equipmentutilization or tracking information.

In accordance with edge-computing or cloud-computing principles, videoand/or still image data may be processed or analyzed to extract relevantinformation prior to being provided to data preparation module 130. Forexample, object detection and classification may be performed on imagesbased on an appropriately trained convolutional neural network (CNN)model, and the resulting classifications and or regression parametersmay be stored as metadata, as described in further detail below. Inother embodiments, data sources 101 provide raw, unprocessed data thatis handled by data preparation module 130.

Example 1—Transportation

While the present invention may be utilized in a wide range of datasources, in accordance with one embodiment, system 100 is used to makeinferences—and produce an appropriate leading indicator output 180(e.g., a temporally leading indicator)—based on observations relating totransportation activity, such as the behavior and cargo of marinevessels, commercial trucks, and the like.

In that regard, FIG. 3 depicts an example marine port environment 301 asseen in aerial view. More particularly, a shipping vessel 350 is seennavigating a waterway close to the shore, where a number of sensors 310have been deployed. In one embodiment, for example, sensors 310 arecameras (e.g., a combination of IR and optical cameras) whose positionsand orientations give them a view of the approaching vessel 350. Thus,sensors 310 correspond to one or more data sources 101 as illustrated inFIG. 1, and may be considered a public data source (e.g., a stationaryInternet webcam) or a private data source (e.g., camera access providedfor a fee). Sensors 310, when used in conjunction with the predictiveanalytics system 100, is capable of observing a number of attributes ofvessel 350.

For example, referring now to FIGS. 4A and 4B, sensors 310 may be usedto produce images that can be analyzed to determine the position ofvessel 350 in the water—e.g., before and after loading its cargo.Comparing FIGS. 4A and 4B, it can be seen that the vertical lineardimension z₁ as shown in FIG. 4A is greater than z₂ in FIG. 4B. Fromthis, a properly trained model, knowing the tonnage of vehicle 350, maybe used to determine the weight of the loaded goods. In cases where theobserved shipping containers is largely unchanged, this information canbe used to infer how many of the shipping containers are empty, and howmany are full.

While not illustrated in the drawings, a further machine learning ordeep learning model (e.g., a convolutional neural network) may be usedto identify the nationality and/or corporate owner and/or the uniqueidentity of vessel 350 via its shape, signs, colors, and markings. Inthis way, the unique features of vessel 350 allow it to be tracked fromport to port. In addition, to the extent that the shipping containersthemselves are visible, a further model may be used to determine thetype and origin of the goods themselves.

In addition to the side view images provided by sensors 310, otheravailable data may be employed to characterize the behavior of vessel350. For example, referring now to FIG. 5, the wake region 352 extendingfrom the stern of vessel 350 may be analyzed from its top view image (asmight be available from satellite data). That is, both the dispersionangle θ and wake distance x may be correlated to the amount of waterbeing displaced over time, which will directly correlate to weight andspeed of the vessel, which itself relates to the amount of goods beingtransported.

Example 2—Parking Lot Observations

In accordance with another embodiment, system 100 may be used to makeinferences based on parking patterns associated with retail storelocations. Referring to FIGS. 6A and 6B, for example, the type andnumber of vehicles parked near a business 610 may be analyzed over timefrom aerial view images. Thus, consider FIG. 6A, which illustrates arelative large number of vehicle 605 within parking lot 601. At someother time (e.g., the same time and day a week later), as illustrated inFIG. 6B, much fewer vehicles 605 are visible in that same parking lot ata later date (602). This may be used to infer that business 610, andperhaps even the whole sector of which business 610 is a member, islikely to experience and eventually report to the market, a downturn inbusiness activity.

Further in accordance with this embodiment, the actual make and model ofvehicles within parking lot 601, 602 may be used to infer economicactivity and the demographics or socio-economic status of their patronsor customers. For example, consider the case in which business 610provides very low cost furniture. It might be determined, via apredictive analytics model, that the average value of the vehicles inparking lot 601 correlates to the overall economic health of thecommunity. More particularly, in one year, the mean price (determinedvia publicly available data) of the vehicles might be $7,000, with astandard deviation of $3000. In a subsequent year, in which the uppermiddle class (i.e., a group of individuals who would not typically shopat business 610) might be struggling financially, the mean price of thevehicles might increase to $24,000, with a standard deviation of$20,000. This information could not only prove to be valuable for afinancial analyst, fund manager, stock trader, etc. but could also bevaluable for the business owner whose demographic is changing.

Example 3—Agriculture

In accordance with another embodiment, system 100 may be used to makeinferences relating to agricultural economic indicators. In such cases,data sources 101 might include, without limitation: (1) satellite imagesof farms as they change over time, which can be used to determine thetypes of crops being planted regionally, timing of harvests, impact ofweather on crop, etc.; (2) publically available information fromgovernment agencies re crop yields and types by region, crop types; (3)public data on regional climate patterns, such as rainfall, freezetemperatures, droughts, flows, snow, fires, and the like; (4) sensingdata from individual farms or groups of farms; (5) agriculture commoditymarket prices for local, regional, international markets; (6) roadcamera images that show transportation of harvested and processed goodsover time; (7) satellite imagery illustrating crop storage. Inaccordance with this embodiment, the output 180 may relate to commodityprice predictions or any other related agricultural indices.

Data Preparation

Referring again to FIG. 1, data preparation module 130 includes anysuitable combination of hardware, software, and firmware configured toreceive data from data sources 101 and process the data for furtheranalysis by system 100. The processing required by each data source 101will generally vary widely, depending upon the nature of the data andthe use that will be made of the data. Such processing might typicallyinclude, for example, data cleaning (e.g., removing, correcting, orimputing data), data transformation (e.g., normalization, attributeselection, discretization), and data reduction (numerosity reduction,dimensionality reduction, etc.). These techniques are well known in theart, and need not be described in further detail herein.

In addition to standard data pre-processing techniques, data preparationmodule 130 may also assign a consistent form of meta-data to thereceived data streams. That is, one limitation of prior art dataanalysis techniques is that data sources 101 are available in a varietyof forms. Sometimes the “meaning”, context, or semantic content of thedata is clear (e.g., data table fields with descriptive labels), but inother cases the data may not include a data description and/or mightinclude non-intuitive terms of art. Accordingly, one advantage of thepresent invention is that it provides a consistent metadata structureand syntax used to characterize the data and facilitate future analysis(e.g., using the hypothesis generating and testing program, describedbelow). In one embodiment, this metadata structure is a fundamental andcritical enabler of assisted learning and/or unsupervised learningtechniques. The metadata may take a variety of forms (e.g., XML, RDF, orthe like), and may include any number of descriptive fields (e.g., time,date range, geographical location, number of shipping containersobserved, nationality of vessel, sensing system make and modelinformation as well as sensitivity or resolution, analytic methods ormodel revision information used to perform any cleansing, analysis,etc.).

Comparison Module

Comparison module 170 is generally configured to compare the predictionsmade via the leading indicator output 180 to the actual, ground-truthvalues that occur over time. In this way, comparison module 170 assistsin validating models as well as signaling to data source analyticsmodules 160 and 150 that a particular model may need to be further tunedor replaced altogether with a different or more refined version of themodel.

In some embodiments, comparison module 170 monitors the predictive powerof output 180 and takes an action when the correlation coefficient orother statistical metric of the model falls below some minimumcorrelation, accuracy, or precision level. That action might include,for example, re-running the model on new data, using a differentpredictive model, and/or temporarily stopping production of a givenoutput 180. In some embodiments, the hypothesis generating and testingsystem 800 (described below) may be used to train, validate, and test anew model based on its hypothesis testing results.

Data Source Analytics

Data source analytics modules 150 and 160 include suitable hardware,software, and firmware configured to produce and refine predictiveanalytic models to be used to produce the leading indicator output 180.That is, modules 150 and 160 take the predictor variables derived fromthe various data sources (i.e., past data) and build a model forpredicting the value of a target value (also based on past, historicaldata) associated with an economic activity metric. The trained model isthen later used to predict, using current or contemporaneous informationfrom the data sources, the future value of that economic activitymetric.

As a preliminary matter, the phrase “predictive analytics” is used inthe sense of analytic models that are “forward-facing” and are evaluatedbased on how well they predict future behavior, rather than “descriptiveanalytics,” which are primarily “backward-facing” techniques meant tocharacterize the nature of the data in the simplest way possible. Thus,for example, Occam's razor and descriptive analytics might suggest thata dataset can be fitted in a manner that produces reasonable R² andcorrelation values using a simple linear regression model, while thatmodel may not be as proficient at actually predicting future values whencompared to a heterogeneous ensemble model that combines decision trees,neural networks, and other models into a single predictor or series ofpredictors.

In accordance with the present invention, data source analytic modules150 and 160 are implemented as one or more machine learning and deeplearning models that undergo supervised, unsupervised, semi-supervised,reinforcement, or assisted learning and perform classification (e.g.,binary or multiclass classification), regression, clustering,dimensionality reduction, and/or such tasks.

Examples of the models that may be implemented by modules 150 and 160include, without limitation, artificial neural networks (ANN) (such as arecurrent neural networks (RNN) and convolutional neural network (CNN)),decision tree models (such as classification and regression trees(CART)), ensemble learning models (such as boosting, bootstrappedaggregation, gradient boosting machines, and random forests), Bayesiannetwork models (e.g., naive Bayes), principal component analysis (PCA),support vector machines (SVM), clustering models (such asK-nearest-neighbor, K-means, expectation maximization, hierarchicalclustering, etc.), and linear discriminant analysis models.

In accordance with various embodiments, CNN techniques are applied tothose data sources 101 that include imaging, video and/or audio data. Inthis way, object detection and classification can be performed. Forexample, publicly available imaging data may be analyzed to determinethe number, class, and origin of trucks traveling on a roadway at aparticular time (e.g., Amazon Prime trucks, FedEx or UPS vehicles, andthe like). A trained CNN may also be used to observe marine vessels inthe vicinity of a port (as described in further detail below) anddetermine the number and type of offloaded shipping containers. In yetother embodiments, aerial image data of parking lots or other publicspaces may be analyzed to perform object detection and classification ofconsumer vehicles. Security cameras in a retail shopping mall or acommercial office complex could be used to determine the number and typeof shoppers or employees coming at various times and match thatinformation with the types of bags they are carrying, clothes they arewearing, their ages, vehicles they are driving in, etc.

Data Warehouse

Data warehouse 140 is configured to store the various structured andunstructured data generated or otherwise processed by data preparationmodule 130, comparison module 170, and data source analytics modules 150and 160. In that regard, data warehouse 140 may be implemented using avariety of known data storage paradigms, including, for example, arelational database management system (RDBMS) such as Oracle, MySQL,Microsoft SQL Server, PostgreSQL, or the like. Data warehouse 140 mayalso be implemented using NoSQL databases, distributed databases,schema-free systems (e.g., MongoDB), Hadoop, and/or any other datastorage paradigm now known or later developed.

Microeconomic/Macroeconomic Indicators

Output 180 of system 100 may be any indicator or set of indicators thatare capable of predicting, alone or in combination with otherindicators, the state of a microeconomic and/or macroeconomic system—forexample, a metric that characterizes that space. This system may beglobal, national, regional, online, or any other subset of economicactivity, may take a variety of forms, and may correspond to a widerange of attributes.

As with the data sources described above, output 180 will often take theform of discrete or continuous numerical data (e.g., integers, floatingpoint numbers, etc.) or categorical data (e.g., unordered or orderedcategories). Output 180 may include a series of time-varying scalarvalues, one or more matrices or tables, or higher order tensors ofnumeric values. Output 180 may also be Boolean (e.g., True/False) or maycontain the output of deep learning applied image, video or audioinference.

In general, indicators of economic events can be categorized as eitherleading indicators (which precede economic events), lagging indicators(which occur after economic events), or coincident indicators (whichoccur at substantially the same time as economic events). In accordancewith the present invention, output 180 is preferably either a leadingindicator or, when output 180 can be provided to a subscriber veryquickly, a coincident indicator.

The semantic meaning of output 180 may vary depending upon context. Inthe shipping scenario, for example, output 180 might include theestimated weight of some good, such as automobiles, clothing, cellularphones, or the like, imported into the country during a certaintimeframe. In a parking scenario, output 180 might be the numberand/types of vehicles parked at a set of observed locations. In thecontext of agricultural products, output 180 might include, for example,the percentage of hemp crops that appear to be unusually dark in aerialview images.

The output 180 might also include information that indicates levels ofconstruction activity derived by monitoring construction vehicles,movements of heavy equipment, physical changes to construction sitesthat is then correlated to government published housing startsinformation and the published reports from corporations involved inconstruction to create predictive metrics of building activity. The sametechniques could be used to track the rate of lumber harvesting, lumbermill activities, livestock farming, road/bridge/building construction,surface mining or mineral collection, chemical refining processes,loading/shipping/unloading of goods at ports of entry, vehicles beingsold from a retail car lot, vehicles in inventory after manufacturing,cargo or passenger trains on an entire railway network, to only name afew. These all can be correlated to historical published or privatereports of economic activity to model and ultimately predict economicmarket trends.

Process Flow

FIG. 2 presents a combination block diagram/flow chart that illustrates,generally, the way information and data may flow through PAS 100 asillustrated in FIG. 1. More particularly, referring now to FIG. 2 incombination with FIG. 1, data sources 201 may include various raw datastreams (211, 212) as well as composite data types such as, for example,summary statistics, trends, and algorithms (213), or images, video, andaudio streams (214). Data sources 201 in FIG. 2 thus generallycorrespond to data sources 101 in FIG. 1.

Similarly, data processing and storage module 230 in FIG. 2 generallycorresponds to data preparation module 130 and data warehouse 140, andis configured to clean, sort, filter, concatenate, process, andotherwise format the data from data sources 201. Subsequently,individual data streams are analyzed (module 240) and then the variouspredictive analytics models for those data sources are refined and fedback to module 230 (via module 250). In general, the phrase “dataset” asused herein refers to a portion or subset of data received from a datastream. In parallel, aggregate datasets are analyzed (module 260),refined (module 270) and provided back to module 230 for storage. Thus,modules 240 and 250 together correspond to module 150 in FIG. 1, andmodules 260 and 270 generally correspond to module 160 of FIG. 1.

Finally, via a publishing module 280, the various leading indicators(generally corresponding to output 180 in FIG. 1) are provided to a setof subscribers/users 290. The published indicators, data models,predictions, may be provided in exchange for a fee (e.g., a one-time feeor subscription fee), or may be provided in exchange for access toprivately owned data sources or other information of value to theprocess as illustrated.

FIG. 7 is a flowchart illustrating a method 700 in accordance with oneembodiment of the present invention, and generally corresponds to themajor processes illustrated in FIG. 7. More particular, data is firstreceived from a plurality of data sources (step 701), followed by datacleansing and pre- processing (702). Subsequently, predictive analyticsare performed on individual data streams (703) and the resulting modelsand algorithms are refined (704). Metadata may then be assigned to theresulting data (705). The datasets are aggregated (706), and then thepredictive analytics models and algorithms for the aggregated data arefurther refined (707) as previous described. Finally, the leadingindicators are published for use (708).

Generalized Hypothesis Generating and Testing System

As mentioned in the Background section above, predictive factors arelikely to be buried in a vast array of fast-moving data streams thatcannot be analyzed in the traditional manner by human beings—i.e., thetime-consuming process of applying traditional hypothesis testing andthe scientific method to such data by humans is impracticable.

To remedy this, FIG. 8 illustrates a hypothesis generating and testingsystem (HGTS) 800. In general, HGTS 800 can be seen as a furtherabstraction of the predictive analytics system 100 of FIG. 1, in that ituses machine learning techniques to teach itself how to form thepredictive analytics models described above. Thus, HGTS can be seen as aform of general artificial intelligence operating in the field ofeconomics.

In general, analytics engine 830 is configured to form its ownhypothesis (e.g., “variable 1 is correlated to variables 2 and 3”) andsubsequently test that hypothesis on cached data 831 and/or data sources801. Thus, engine 830 is capable of performing its own plannedexperiments. The results and conclusions of its experiments (e.g.,correlation coefficients, analysis of variance, etc.) are stored alongwith the hypothesis and model itself in a metadata format so thatongoing trends in model accuracy can be observed and utilized to furtherimprove both model/algorithm accuracy as well as the hypothesisgenerating and testing system itself.

In order to facilitate the creation of hypotheses, a consistent metadataformat is provided. This allows the system to minimize the effortrequired for unassisted hypothesis generation and testing by properlypresenting data for analysis, thus more-effectively “compare apples toapples.” The structure of the metadata may vary, but in one embodimentthe metadata includes data format, date range, sensor type, sensoraccuracy, sensor precision, data collection methods, location of datacollection, data preparation methods performed, image content ranges,NLP methods, data source/publication, and the like.

Referring now to FIG. 8, HGTS 800 generally includes an analytics engine830 that receives data from data sources 801 (e.g., 811-814 and821-823), including both public data 810 and private data 820. Thenature of this data is the same as previously described in connectionwith FIGS. 1 and 2. Analytics engine 830 also includes a cached datastore 831 for storage of data used in prior experiments.

Engine 830 may begin by performing an initial round of experiments withlimited datasets to assess whether a particularly hypothesis is likelyto be successful. After initial correlations are found, the engine 830prioritizes a list of possible hypotheses and seeks to explore those.This is comparable to the separate training, validation, and testingsteps used in connection with training machine learning, deep learning,assisted learning or other models.

As is known in the art, a leading indicator may be characterized by itsdirection relative to the attribute it is being tested against—i.e.,procyclical (same direction), countercyclical (opposite direction), oracyclical (no correlation). As illustrated in FIG. 8, after anexperiment has concluded, the results (840) inform how the experiment istreated. Specifically, if there is a statistically significantcorrelation—either positive or negative (i.e., countercyclical orprocyclical)—the metadata, model, and data for that experiment arestored in context-aware artificial intelligence database (CAAD) 850. Ifthere is a non-existent correlation (i.e., acyclical), the entireexperiment is discarded. If, however, the determined correlation and/orits statistical significance are borderline or otherwise weak, then thatexperiment may be stored within probationary database 851 for furthercorrelations refinement with additional hypotheses to be tested whenadditional computation cycles or when additional data become available.This is of critical importance as economic trends are never due entirelyto a single factor but rather are a culmination of an extensive set ofdependent and independent variables. Thus, as societies change with theadvent of new technologies, new geo-political issues, new trade deals,new manufacturing methods, new shipping methods, new laws or policies,new weather patterns, etc. new variables will begin to emerge withgreater significance where they previously were irrelevant and viceversa. By storing hypotheses in a probationary database it is possibleto trend correlations over time and begin to explore new factors thatmay be been previously unknown or undervalued in performing economicanalyses. This would provide a major advantage for any system ofanalytics as the prioritization of variables of emerging relevance couldbecome easily identifiable.

The above systems and methods may be used in scenarios where regionalepidemics, global pandemics, military or political conflicts, orsocioeconomic unrest arise in order to precisely measure the impact ineconomic activity trends associated with changes in manufacturing,processing, shipping, storage, distribution, and tourism by region, byeconomic sector and by company. Such information may be a leadingindicator used to make more informed investment decisions.

The successful experiments stored within CAAD 850 can then be re-used orincorporated into subsequent experiments by analytics engine 830.Similarly, analytics engine 830 may choose to recall an experiment fromprobationary database 851 and refine it for further testing—perhapsusing a larger or less noisy data set, or changing dependent/independentvariables.

In conclusion, the foregoing describes various systems and methods forapplying predictive analytics techniques to produce leading indicatorsof economic activity based on a range of available data sources. In someembodiments, those methods include applying such predictive analyticstechniques to produce leading indicators based on public and/or privatetransportation data, for example, producing the leading indicators basedon data sources associated with the direct observation of activity atone or more maritime facilities. In some embodiments, the methodincludes producing the leading indicators based on aerial view images ofconsumer vehicles. What has also been described are systems and methodsfor providing a fee-based subscription system for the sharing of suchleading indicators to users. More broadly, the present subject matteralso relates to a system for providing a consistent, semantic meta-datastructure associated with data and for generating and testing hypothesesutilizing machine learning, deep learning, assisted learning techniques(e.g., via non-supervised learning).

In accordance with one embodiment, a predictive analytics systemcomprising: a plurality of datasets received from a correspondingplurality of data sources, wherein at least one of the plurality ofdatasets includes first sensor data derived from direct observation ofactivity within an environment; a data preparation module configured toassign metadata to the stored datasets; at least one predictiveanalytics module configured to train a machine learning model using thestored datasets as a source of predictor variables based on the assignedmetadata, and to use historical information regarding a metric ofeconomic activity as a target variable; and a publishing moduleconfigured to provide, to one or more subscribers, output data includinga leading indicator of the target variable based on the trained machinelearning model and contemporaneous information received from at leastone of the data sources.

In accordance within one embodiment, the first sensor data includesimaging data documenting transportation activity in the environment. Invarious embodiments, the transportation activity is selected from thegroup consisting of: (a) the nature and activity of aircraft; (b) thenature and activity of automotive vehicles, and (c) the nature andactivity of marine vessels.

In accordance with one embodiment, the transportation activitycorresponds to the nature and behavior of marine vessels in the vicinityof a port; the machine learning model is configured to perform objectdetection and classification with respect to the marine vessels observedat the port; and the predictor variables include the respective typesand motions of the marine vessels relative to the port.

In accordance with one embodiment, the plurality of datasets includesaerial images of the port; and the machine learning model is furtherconfigured to determine at least one of the acceleration and velocity ofthe marine vessels based on the aerial images.

In accordance with one embodiment, the plurality of datasets includeselevation view images of the port; and the machine learning model isfurther configured to determine a cargo weight for the marine vesselsbased on the vertical position of the vessels relative to a waterline.

In accordance with one embodiment, the transportation activitycorresponds to the nature and behavior of trucking vehicles on a sectionof roadway; the machine learning model is configured to perform objectdetection and classification with respect to the trucking vehicles; andthe predictor variables include the respective types and motions of themarine vessels relative to the port. In one embodiment, the data sourceassociated with observation of the transportation activity is a publiclyavailable traffic camera accessible via the Internet.

In accordance with one embodiment, the transportation activitycorresponds to the nature and behavior of automotive vehicles parkedwithin a parking lot adjacent a business; the machine learning model isconfigured to perform object detection and classification with respectto the automotive vehicles; and the predictor variables include therespective types and number of automotive vehicles detected within theparking lot over time.

In accordance with one embodiment, the first datasets includeinformation documenting agricultural activity, e.g., selected from thegroup consisting of (a) time-lapse satellite images of one or morefarms, (b) publicly available crop-yield data, (c) regional climatedata, (d) agricultural commodity market prices, and (e) crop storageimages.

In accordance with one embodiment, the predictive analytics moduletrains the machine learning model, selects the predictor variables, anddetermines the leading indicator using a generalized hypothesisgenerating and testing system (HGTS) and the assigned metadata.

Embodiments of the present disclosure may be described herein in termsof functional and/or logical block components and various processingsteps. It should be appreciated that such block components may berealized by any number of hardware, software, and/or firmware componentsconfigured to perform the specified functions. For example, anembodiment of the present disclosure may employ various integratedcircuit components, e.g., memory elements, digital signal processingelements, logic elements, look-up tables, or the like, which may carryout a variety of functions under the control of one or moremicroprocessors or other control devices.

In addition, those skilled in the art will appreciate that embodimentsof the present disclosure may be practiced in conjunction with anynumber of systems, and that the systems described herein are merelyexemplary embodiments of the present disclosure. Further, the connectinglines shown in the various figures contained herein are intended torepresent example functional relationships and/or physical couplingsbetween the various elements. It should be noted that many alternativeor additional functional relationships or physical connections may bepresent in an embodiment of the present disclosure.

As used herein, the terms “module” or “controller” refer to anyhardware, software, firmware, electronic control component, processinglogic, and/or processor device, individually or in any combination,including without limitation: application specific integrated circuits(ASICs), field-programmable gate-arrays (FPGAs), dedicated neuralnetwork devices (e.g., Google Tensor Processing Units), quantumcomputing, visual or image processing units, graphic processing units(GPUs), system on chips (SOCs), central processing units (CPUs),microcontroller units (MCUs), electronic circuits, processors (shared,dedicated, or group) configured to execute one or more software orfirmware programs, a combinational logic circuit, and/or other suitablecomponents that provide the described functionality.

As used herein, the word “exemplary” means “serving as an example,instance, or illustration.” Any implementation described herein as“exemplary” is not necessarily to be construed as preferred oradvantageous over other implementations, nor is it intended to beconstrued as a model that must be literally duplicated.

While the foregoing detailed description will provide those skilled inthe art with a convenient road map for implementing various embodimentsof the invention, it should be appreciated that the particularembodiments described above are only examples, and are not intended tolimit the scope, applicability, or configuration of the invention in anyway. To the contrary, various changes may be made in the function andarrangement of elements described without departing from the scope ofthe invention.

1. A predictive analytics system comprising: a plurality of datasetsreceived from a corresponding plurality of data sources, wherein atleast one of the plurality of datasets includes first sensor data,including imaging data, derived from direct observation of activitywithin an environment: a data preparation module, including a processingdevice, configured to assign metadata to the stored datasets; at leastone predictive analytics module, including a processing device,configured to train first and second machine learning models using thestored datasets as a source of predictor variables based on the assignedmetadata for each of the machine learning models, and to use historicalinformation regarding a metric of economic activity as a targetvariable, wherein the first and second machine learning models differ inat least one of machine learning model type and assigned metadata; ahypothesis generating and testing system (HGTS) that trains the firstand second machine learning models, selects the predictor variables forthe machine learning models, and determines the leading indicator usingthe assigned metadata for the machine learning models; the HGTSincluding an analytics engine configured to: assign first and secondstatistical metrics to the first and second machine learning models;store the assigned metadata, the first machine learning model, and thepredictor variables in a primary database in response to determiningthat the statistical metric assigned to the first machine learning modelmeets a predetermined level; and store the assigned metadata, the secondmachine learning model, and the predictor variables in a second,probationary database in response to determining that the second machinelearning model exhibits a statistical metric does not meet apredetermined level; and select between the first and second machinelearning models based on their respective statistical metric; and apublishing module configured to provide, to one or more subscribers,selected model information and output data including a leading indicatorof the target variable based on the selected machine learning model andcontemporaneous information received from at least one of the datasources.
 2. The predictive analytics system of claim 1, wherein thefirst sensor data includes at least one of audio data, imaging data,transportation equipment data, and location services data documentingtransportation activity in the environment selected from the groupconsisting of: (a) type and activity of aircraft; (b) type and activityof vehicles, (c) type and activity of marine vessels, and (d) type andactivity of trains; further wherein the metadata includes at least: datacollection methods for the first sensor data; data preparation methodsfor the first sensor data; and at least one of accuracy, precision, orresolution of the first sensor data.
 3. The predictive analytics systemof claim 2, wherein: the transportation activity corresponds to the typeand behavior of at least one of aircraft, trains or marine vessels inthe environment; each of the machine learning models is configured toperform object detection and classification with respect to theaircraft, train or marine vessels; and the predictor variables includethe respective types and motions of the aircraft, train or marinevessels.
 4. The predictive analytics system of claim 3, furtherincluding determining and classifying the number and type of cargo heldby at least one of the aircraft, train or marine vessels.
 5. Thepredictive analytics system of claim 3, further including determiningthe identity of at least one of the aircraft, train or marine vesselsbased on one or more visual characteristics of that aircraft, train orvessel.
 4. The predictive analytics system of claim 3, wherein: theplurality of datasets includes at least one of satellite and aerialimages of the environment; and each of the machine learning models isfurther configured to determine at least one of the acceleration,velocity, cargo volume, cargo type, and activity over time of theaircraft, train or marine vessels based on the at least one of satelliteand aerial images.
 5. The predictive analytics system of claim 3,wherein: the plurality of datasets includes elevation view images of theenvironment; and each of the machine learning models is furtherconfigured to determine a cargo weight for the marine vessels based onthe vertical position of the vessels relative to a waterline.
 6. Thepredictive analytics system of claim 3, wherein: the plurality ofdatasets includes at least one of audio data, image data, transportationequipment data, cargo monitoring data, location services data of theaircraft, train or marine vessel or the cargo on the aircraft, train ormarine vessel; and each of the machine learning models is furtherconfigured to provide real time and historical information on at leastone of the aircraft, train or marine vessels or the cargo held by atleast one of the aircraft, train or marine vessels.
 7. The predictiveanalytics system of claim 2, wherein: the transportation activitycorresponds to the type and behavior of vehicles on a section ofroadway; each of the machine learning models is configured to performobject detection and classification with respect to the vehicles, theoccupants of the vehicles or the cargo of the vehicles; and thepredictor variables include the respective types, motions, activity ofthe vehicles over time.
 8. The predictive analytics system of claim 7,wherein the data source associated with observation of thetransportation activity is a mobile or stationary traffic camera orother imaging device.
 9. The predictive analytics system of claim 7,wherein the data source associated with observation of thetransportation activity is at least one of satellite or aerial camera orother imaging device.
 10. The predictive analytics system of claim 7,wherein the plurality of data sets includes at least one of locationservices data, transportation equipment data of the vehicle or the cargobeing carried on the vehicle.
 11. The predictive analytics system ofclaim 2, wherein: the transportation activity corresponds to the typeand behavior of automotive vehicles parked within a region defined by atleast one of a parking lot and a loading area adjacent a business, and astorage area; the machine learning model is configured to perform objectdetection and classification with respect to the automotive vehicles;and the predictor variables include the respective types and numbermotions and activity of automotive vehicles, the occupants or cargo ofthe automotive vehicles detected within the region over time.
 12. Thepredictive analytics system of claim 1, wherein at least one of thedatasets includes information documenting agricultural, mining, orharvesting activity.
 13. The predictive analytics system of claim 12,wherein the datasets are selected from the group consisting of:time-lapse images comprising at least one of satellite and aerial imagesof one or more farms, mines, forests, pastures, storage or processingfacilities; the machine learning model is configured to perform objectdetection and classification with respect to at least one of thevehicles, machines or equipment used, the number and type of livestock,the volume and type of crops, the volume and type of material beingmined, the volume and type of lumber being harvested.
 14. The predictiveanalytics system of claim 12, wherein the plurality of datasets includesinformation documenting agricultural crop-yield data; regional climatedata; agricultural, mining, lumber commodity market prices; fieldimages, pasture images, forest images, crop images, cropyield/productivity data; logging yield/productivity data; herdingyield/productivity data; and mining yield/productivity data.
 15. Thepredictive analytics system of claim 1, wherein the statistical metricis based on compiling and rank-ordering data correlations for the firstand second machine learning models and determining statisticalsignificance of the data correlations.
 16. The predictive analyticssystem of claim 1, wherein the plurality of datasets includes at leastone of location services and vehicle, occupant, pedestrian, worker orcargo activity logs.
 17. A computer-implemented method for hypothesisgenerating and testing, comprising the steps of: training first andsecond machine learning models having corresponding first and secondassigned metadata, wherein the first and second machine learning modelsare trained using a plurality of datasets received from a correspondingplurality of data sources, wherein at least one of the plurality ofdatasets includes first sensor data, including imaging data, derivedfrom direct observation of activity within an environment: selectingfirst and second predictor variables, respectively, for the first andsecond machine learning models; determining a leading indicator ofeconomic activity using the assigned metadata for the first and secondmachine learning models; assigning first and second statistical metricsto the first and second machine learning models; storing the firstassigned metadata, the first machine learning model, and the predictorvariables in a primary database in response to determining that thestatistical metric assigned to the first machine learning model meets apredetermined level; storing the second assigned metadata, the secondmachine learning model, and the second predictor variables in a second,probationary database in response to determining that the second machinelearning model exhibits a statistical metric that does not meet thepredetermined level; and selecting between the first and second machinelearning models based on their respective statistical metrics.
 18. Themethod of claim 17, wherein the statistical metric is based on compilingand rank-ordering data correlations for the first and second machinelearning models based on statistical significance of the datacorrelations.
 19. The method of claim 17, wherein the plurality ofdatasets includes at least one of location services and vehicle,occupant, pedestrian, worker or cargo activity logs.
 20. The method ofclaim 17, wherein the plurality of datasets includes at least one ofpublic or private economic datasets for a business, sector, local,regional or global nature.